Enterprise Database Systems
Data Architecture: Getting Started
Data Architecture Primer
Final Exam: Data Analyst

Data Architecture Primer

Course Number:
it_dsdagsdj_01_enus
Lesson Objectives

Data Architecture Primer

  • Course Overview
  • identify the relationship between data, information, and analytics
  • recognize PII, PHI, and common data privacy regulations
  • list the six phases of the data lifecycle
  • compare and contrast SQL and NoSQL database solutions
  • use Visual Paradigm to create a relational database ERD
  • deploy Microsoft SQL Server in the Amazon Web Services cloud
  • deploy DynamoDB in the Amazon Web Services cloud
  • define what big data is and how it is managed
  • recognize the relationship between data and how it is governed
  • distinguish among the various types of data architectures, including the TOGAF enterprise architecture
  • describe how organizations can derive value from data they already have
  • implement effective data management solutions

Overview/Description

In this 12-video course, learners explore how to define data, its lifecycle, the importance of privacy, and SQL and NoSQL database solutions and key data management concepts as they relate to big data. First, look at the relationship between data, information, and analysis. Learn to recognize personally identifiable information (PII), protected health information (PHI), and common data privacy regulations. Then, study the data lifecycle's six phases. Compare and contrast SQL and NoSQL database solutions and look at using Visual Paradigm to create a relational database ERD (entity-relationship diagram). To implement an SQL solution, Microsoft SQL Server is deployed in the Amazon Web Services (AWS) cloud, and a NoSQL solution by deploying DynamoDB in the AWS cloud. Explore definitions of big data and governance. Learners will examine various types of data architecture, including TOGAF (The Open Group Architecture Framework) enterprise architecture. Finally, learners study data analytics and reporting, how organizations can derive value from data they have. The concluding exercise looks at implementing effective data management solutions.



Target

Prerequisites: none

Final Exam: Data Analyst

Course Number:
it_fedads_01_enus
Lesson Objectives

Final Exam: Data Analyst

  • build and run the application and confirm the output using HDFS from both the command line and the web application
  • compare and contrast SQL and NoSQL database solutions
  • configure a JDBC connection on Glue to the Redshift cluster
  • configure and view permissions for individual files and directories using the getfacl and chmod commands
  • configure HDFS using the hdfs-site.xml file and identify the properties which can be set from it
  • crawl data stored in a DynamoDB table
  • create and configure a Hadoop cluster on the Google Cloud Platform using its Cloud Dataproc service
  • create and configure simple graphs with lines and markers using the Matplotlib data visualization library
  • create and load data into an RDD
  • Create data frames in R
  • create matrices in R
  • create vectors in R
  • define linear regression
  • define the contents of a DataFrame using the SQLContext
  • define the inter-quartile range of a dataset and enumerate its properties
  • Define the mean of a dataset and enumerate its properties
  • delete a Google Cloud Dataproc cluster and all of its associated resources
  • deploy DynamoDB in the Amazon Web Services cloud
  • describe and apply the different techniques involved in handling datasets where some information is missing
  • describe NoSQL Stores and how they are used
  • describe the concept of hierarchical index or multi-index and why can be useful
  • describe the ETL process and different tools available
  • describe the options available when iterating over 1-dimensional and multi-dimensional arrays
  • draw the shape of a Gaussian distribution and enumerate its defining properties
  • edit individual cells and entire rows and columns in a Pandas DataFrame
  • execute the application and verify that the filtering has worked correctly; examine the job and the output files using the YARN Cluster Manager and HDFS NameNode web UIs
  • explain the concept of hierarchical index or multi-index and why can be useful
  • export the contents of a DataFrame into files of various formats
  • export the contents of a DataFrame into files of various formats
  • identify different tools available for data management
  • identify the various GCP services used by Dataproc when provisioning a cluster
  • import and export data in R
  • initialize a Spark DataFrame from the contents of an RDD
  • install Pandas and create a Pandas Series
  • list the six phases of the data lifecycle
  • load data into a Redshift cluster from S3 buckets
  • read data from an Excel spreadsheet
  • read data from files and write data to files using the Python Pandas library
  • recall how Apache Zookeeper enables the HDFS NameNode and YARN ResourceManager to run in high-availability mode
  • recall the steps involved in building a MapReduce application and the specific workings of the Map phase in processing each row of data in the input file
  • recognize and deal with missing data in R
  • recognize the challenges involved in processing big data and the options available to address them such as vertical and horizontal scaling
  • retrieve specific parts of an array using row and column indices
  • run ETL scripts using Glue
  • run the application and examine the outputs generated to get the word frequencies in the input text document
  • set up a JDBC connection on Glue to the Redshift cluster
  • specify the configurations of the MapReduce applications in the Driver program and the project's pom.xml file
  • standardize a distribution to express its values as z-scores and use Pandas to generate a correlation and covariance matrix for your dataset
  • transfer files from your local file system to HDFS using the copyFromLocal command
  • use fancy indexing with arrays using an index mask
  • use NumPy to compute statistics such as the mean and median on your data
  • use NumPy to compute the correlation and covariance of two distributions and visualize their relationship with scatterplots
  • use the dplyr library to load data frames
  • use the get and getmerge functions to retrieve one or multiple files from HDFS
  • use the ggplot2 library to visualize data using R
  • use the NumPy library to manipulate arrays and the Pandas library to load and analyze a dataset
  • using the independent t-test and with a related sample using a paired t-test using the SciPy library
  • using the mutate method
  • work with the YARN Cluster Manager and HDFS NameNode web applications that come packaged with Hadoop
  • write a simple bash script

Overview/Description

Final Exam: Data Analyst will test your knowledge and application of the topics presented throughout the Data Analyst track of the Skillsoft Aspire Data Analyst to Data Scientist Journey.



Target

Prerequisites: none

Close Chat Live